Statistical Laws Governing Fluctuations in Word Use 
from Word Birth to Word Death 



(N 

o 
in 



Alexander M. Petersen,' Joel Tenenbaum,^ Shlomo Havlin,^ and H. Eugene Stanley^ 

Laboratory for the Analysis of Complex Economic Systems, 

IMT Lucca Institute for Advanced Studies, Lucca 55100, Italy 

Center for Polymer Studies and Department of Physics, 

Boston University, Boston, Massachusetts 02215, USA 

^Minerva Center and Department of Physics, Bar-Ilan University, Ramat-Gan 52900, Israel 

(Dated: February 16, 2012) 

We analyze the dynamic properties of 10^ words recorded in English, Spanish and Hebrew over the period 
1 800-2008 in order to gain insight into the coevolution of language and culture. We report language independent 
patterns useful as benchmarks for theoretical models of language evolution. A significantly decreasing (increas- 
ing) trend in the birth (death) rate of words indicates a recent shift in the selection laws governing word use. For 
new words, we observe a peak in the growth-rate fluctuations around 40 years after introduction, consistent with 
the typical entry time into standard dictionaries and the human generational timescale. Pronounced changes 
in the dynamics of language during periods of war shows that word correlations, occurring across time and 
between words, are largely influenced by coevolutionary social, technological, and political factors. We quan- 
tify cultural memory by analyzing the long-term correlations in the use of individual words using detrended 
fluctuation analysis. 
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Statistical laws describing the properties of word use, such 
as Zipf s law lU-Sl and Heaps' law Q ID, have been thor- 
oughly tested and modeled. These statistical laws are based on 
static snapshots of written language using empirical data ag- 
gregated over relatively small time periods and comprised of 
relatively small corpora ranging in size from individual texts 
iL 2J to relatively small collections of topical texts [3] HJ. 
However, language is a fundamentally dynamic complex sys- 
tem, consisting of heterogenous entities at the level of the 
units (words) and the interacting users (us). Hence, we be- 
gin this paper with two questions: (i) Do languages exhibit 
dynamical patterns? (ii) Do individual words exhibit dynami- 
cal patterns? 

The coevolutionary nature of language requires analysis 
both at the macro and micro scale. Here we apply interdis- 
ciplinary concepts to empirical language data collected in a 
massive book digitization effort by Google Inc., which re- 
cently unveiled a database of words in seven languages, af- 
ter having scanned approximately 4% of the world's books. 
The massive "n-gram" project |9| allows for a novel view 
into the growth dynamics of word use and the birth and death 
processes of words in accordance with evolutionary selection 
laws flQl. 

A recent analysis of this database by Michel et al. fTT| 
addresses numerous well-posed questions rooted in cultural 
anthropology using case studies of individual words. Here 
we take an alternative approach by analyzing the aggregate 
properties of the language dynamics recorded in the Google 
Inc. data in a systematic way, using the word counts of every 
word recorded over the 209-year time period 1800 - 2008 in 
the English, Spanish, and Hebrew text corpora. This period 
spans the incredibly rich cultural history that includes several 



international wars, revolutions, and numerous technological 
paradigm shifts. Together, the data comprise over 1 x 10^ dis- 
tinct words. We use concepts from economics to gain quan- 
titative insights into the role of exogenous factors on the evo- 
lution of language, combined with methods from statistical 
physics to quantify the competition arising from correlations 
between words [12-14J and the memory-driven autocorrela- 
tions in Ui{t) across time ITsHTtI . 

For each corpora comprising millions of distinct words, we 
use a general word-count framework which accounts for the 
underlying growth of language over time. We first define the 
quantity Ui (t) as the number of uses of word i in year t. Since 
the number of books and the number of distinct words have 
grown dramatically over time, we define the relative word use, 
fi{t), as the fraction of uses of word i out of all word uses in 
the same year. 



f,{t)=u,{t)/N^{t) 



^N^(t) 



(1) 



where the quantity Nu{t) = J2i=i^' Ui{t) is the total number 
of indistinct word uses digitized from books printed in year t 
and N^ (t) is the total number of distinct words digitized from 
books printed in year t. To quantify the dynamic properties of 
word prevalence at the micro scale and their relation to socio- 
political factors at the macro scale, we analyze the logarithmic 
growth rate commonly used in finance and economics. 



m+At)^ 



r,{t) ^ ln/.(^ + At)-ln/.(t)=ln( ^-^ + > ) 



.(2) 
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The relative use fi{t) depends on the intrinsic grammati- 
cal utility of the word (related to the number of "proper" sen- 
tences that can be constructed using the word), the semantic 
utility of the word (related to the number of meanings a given 
word can convey), and other idiosyncratic details related to 
topical context. Neutral null models for the evolution of lan- 
guage define the relative use of a word as its "fitness" flSl. 
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FIG. 1: Word extinction. The English word "Roentgenogram" de- 
rives from the Nobel prize winning scientist and discoverer of the 
x-ray, Wilhelm Rontgen (1845-1923). The prevalence of this word 
was quickly challenged by two main competitors, "X-ray" (recorded 
as "Xray" in the database) and "Radiogram." The arithmetic mean 
frequency of these three time series is relatively constant over the 
80-year period 1920-2000, {/) ^ 10"^, illustrating the limited hn- 
guistic "market share" that can be achieved by any competitor. We 
conjecture that the main reason "Xray" has a higher frequency is due 
to the "fitness gain" from its efficient short word length and also due 
to the fact that English has become the base language for scientific 
publication. 



In such models, the word frequency is the only factor deter- 
mining the survival capacity of a word. In reality, word com- 
petition depends on more subtle features of language, such 
as the cognitive aspects of efficient communication. For ex- 
ample, the emergence of robust categorical naming patterns 
observed across many cultures is regarded to be the result of 
complex discrimination tactics shared by intelligent commu- 
nicators. This is evident in the finite set of words describing 
the continuous spectrum of color names, emotional states, and 
other categorical sets 1(1914211 . 

In our analysis we treat words with equivalent meanings 
but with different spellings (e.g. color versus colour) as dis- 
tinct words, since we view the competition among synonyms 
and alternative spellings in the linguistic arena as a key in- 
gredient in complex evolutionary dynamics ifTOl l22ll . For in- 
stance, with the advent of automatic spell-checkers in the dig- 
ital era, words recognized by spell-checkers receive a signif- 
icant boost in their "reproductive fitness" at the expense of 
their misspelled or unstandardized counterparts. 

In the linguistic arena, not just "defective" words 
die, even significantly used words can become extinct. 
Fig. [T] shows three once-significant words: "Radiogram," 
"Roentgenogram," and "Xray". These words compete for the 
majority share of nouns referring to what is now commonly 
known as an "X-ray" (note that such dashes are discarded in 
Google's digitization process). The word "Roentgenogram" 
has since become extinct, even though it was the most com- 
mon term for several decades in the 20th century. It is likely 
that two main factors - (i) communication and information ef- 



ficiency bias toward the use of shorter words f23l and (ii) the 
adoption of English as the leading global language for science 
- secured the eventual success of the word "Xray" by the year 
1980. It goes without saying that there are many social and 
technological factors driving language change. 

We begin this paper by analyzing the vocabulary growth of 
each language over time. We then analyze the lifetime growth 
trajectories of the set of words that are new to each language 
to gain quantitative insight into "infant" and "adult" stages of 
individual words. Using two sets of words, (i) the relatively 
new words, and (ii) the most common words, we analyze the 
statistical properties of word growth. Specifically, we calcu- 
late the probability density function P{r) of growth rate r and 
calculate the size-dependence of the standard deviation a{r) 
of growth rates. In order to gain insight into the long-term 
cultural memory, we conclude the analysis by measuring the 
autocorrelations in word use by applying detrended fluctua- 
tion analysis (DFA) to individual time series. 



Results 

Quantifying the birth rate and the death rate of words. 

Just as a new species can be bom into an environment, a 
word can emerge in a language. Evolutionary selection laws 
can apply pressure on the sustainability of new words since 
there are limited resources (topics, books, etc.) for the use of 
words. Along the same lines, old words can be driven to ex- 
tinction when cultural and technological factors limit the use 
of a word, in analogy to the environmental factors that can 
change the survival capacity of a living species by altering its 
ability to survive and reproduce. 

We define the birth year yo,i ^s the year t corresponding 
to the first instance of fi{t) > 0.05/™, where /™ is median 
word use /™ = Median{ui{t)} of a given word over its 
recorded lifetime in the Google database. Similarly, we define 
the death year i/f^i as the last year t during which the word 
use satisfies fi{t) > 0.05/™. We use the relative word use 
threshold 0.05/™ in order to avoid anomalies arising from 
extreme fluctuations in fi{t) over the lifetime of the word. 
The results obtained using threshold 0.10/™ did not show a 
significant qualitative difference. 

The significance of word births Afc(f) and word deaths 
Arf(t) for each year t is related to the vocabulary size iV„,(t) 
of a given language. We define the birth rate 7b and death rate 
7d by normalizing the number of births and deaths in a given 
year t to the total number of distinct words N^ (i) recorded in 
the same year t, so that 



7d(i) = Ad{t)/N.^{t) . 



(3) 



This definition yields a proxy for the rate of emergence and 
disappearance of words. We restrict our analysis to words 
with birth-death duration y/_,; — yo,j + 1 > 2 years and to 
words with first recorded use tg,* > 1700, which selects for 
relatively new words in the history of a language. 
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FIG. 2: Dramatic shift in the birth rate and death rate of words. 

The word birth rate 7b (i) and the word death rate 7d(f) show marked 
underlying changes in word use competition which affects the en- 
try rate and the sustainability of existing words. The modem print 
era shows a marked increase in the death rate of words which likely 
correspond to low fitness, misspelled and (technologically) outdated 
words. A simultaneous decrease in the birth rate of new words is 
consistent with the decreasing marginal need for new words indi- 
cated by the sub-linear allometric scaling between vocabulary size 
and total corpus size (Heaps' law) |24|. Interestingly, we quanti- 
tatively observe the impact of the Balfour Declaration in 1917, the 
circumstances surrounding which effectively rejuvenated Hebrew as 
a national language, resulting in a 5-fold increase in the birth rate of 
words in the Hebrew corpus. 
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FIG. 3: Survival of the fittest in the entry process of words. 

Trends in the relative uses of words that either were bom or died 
in a given year show that the entry-exit forces largely depend on the 
relative use of the word. For the English corpus, we calculate the 
average of the median lifetime relative use, (Med(/i)), for all words 
bom in year t (top panel) and for all words that died in year t (bottom 
panel), which shows a 5-year moving average (dashed black line). 
There is a dramatic increase in the relative use ("utility") of new- 
bom words over the last 20-30 years, likely corresponding to new 
technical terms, which are necessary for the communication of core 
modern technology and ideas. Conversely, with higher editorial stan- 
dards and the recent use of word processors which include spelling 
standardization technology, the words that are dying are those words 
with low relative use. We confirm by visual inspection that the lists 
of dying words contain mostly misspelled and nonsensical words. 



The 76 (i) and ^ait) time series plotted in Fig. l2]for the 200- 
year period 1800-2000 show trends that intensifies after the 
1950s. The modern era of publishing, which is characterized 
by more strict editing procedures at publishing houses, com- 
puterized word editing and automatic spell-checking technol- 
ogy, shows a drastic increase in the death rate of words. Using 
visual inspection we verify most changes to the vocabulary in 
the last 10-20 years are due to the extinction of misspelled 
words and nonsensical print errors, and to the decreased birth 
rate of new misspelled variations and genuinely new words. 
This phenomenon reflects the decreasing marginal need for 
new words, consistent with the sub-linear Heaps' law ob- 
served for all Google 1-gram corpora in |24|. Moreover, Fig. 
[3] shows that 7h(i) is largely comprised of words with rel- 
atively large median fc while 7d(i) is almost entirely com- 
prised of words with relatively small median /^ (see also Fig. 
S 1 in the Supplementary Information (SI) text). Thus, the new 
words of tomorrow are likely be core words that are widely 
used. 

We note that the main source of error in the calculation of 
birth and death rates are OCR (optical character recognition) 
errors in the digitization process, which could be responsible 
for a significant fraction of misspelled and nonsensical words 
existing in the data. An additional source of error is the vari- 
ety of orthographic properties of language that can make very 
subtle variations of words, for example through the use of hy- 
phens and capitalization, appear as distinct words when ap- 



plying OCR. The digitization of many books in the computer 
era does not require OCR transfer, since the manuscripts are 
themselves digital, and so there may be a bias resulting from 
this recent paradigm shift. We confirm that the statistical pat- 
terns found using post 2000- data are consistent with the pat- 
terns that extend back several hundred years ll24l . 

Complementary to the death of old words is the birth of 
new words, which are commonly associated with new social 
and technological trends. Topical words in media can display 
long-term persistence patterns analogous to earthquake 
shocks 1251 l26ll . and can result in a new word having larger 
fitness than related "out-of-date" words (e.g. blog vs. log, 
email vs. memo). Here we show that a comparison of the 
growth dynamics between different languages can also illus- 
trate the local cultural factors that influence different regions 
of the world. Fig. |4]shows how international crisis can lead to 
globalization of language through common media attention 
and increased lexical diffusion. Notably, as illustrated in 
Fig. Wa), we find that international conflict only perturbed 
the participating languages, while minimally affecting the 
languages of the nonparticipating regions, e.g. the Spanish 
speaking countries during WWII. 

The lifetime trajectory of words. Between birth and death, 
one contends with the interesting question of how the use of 
words evolve when they are "alive." We focus our efforts to- 
ward quantifying the relative change in word use over time. 
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FIG. 4: The signiflcance of historical events on the evolution of language. The standard deviation a{t) of growth rates demonstrates 
the sensitivity of language to international events (e.g. World War II). For all languages there is an overall decreasing trend in a{t) over the 
period 1850-2000. However, the increase in a{t) during WWII represents a"globalization" effect, whereby societies are brought together 
by a common event and a unified media. Such contact between relatively isolated systems necessarily leads to information flow, much as in 
the case of thermodynamic heat flow between two systems, initially at different temperatures, which are then brought into contact, (a) a{t) 
calculated for the relatively new words with Ti > 100 years. The Spanish corpus does not show an increase in a{t) during World War II, 
indicative of the relative isolation of South America and Spain from the European conflict, (b) a (t) for 4 sets of relatively new words that meet 
the criteria Ti > Tc and ti.o > 1800. The oldest "new" words (Tc — 200) demonstrate the most significant increase in a{t) during World 
War II, with a peak around 1945. (c) The standard deviation a{t) for the most common words is decreasing with time, suggesting that they 
have saturated and are being "crowding out" by new competitors. This set of words meets the criterion that the average relative use exceeds a 
threshold, (fi) > fc, which we define for each corpus, (d) We compare the variation a{t) for relatively new English words, using T > 100, 
with the 20-year moving average over the time period 1820-1988. The deviations show that a{t) increases abruptly during times of conflict, 
such as the American Civil War (1861-1865), World War I (1914-1918) and World War II (1939-1945), and also during the 1980s and 1990s, 
possibly as a result of new digital media (e.g. the internet) which offer new environments for the evolutionary dynamics of word use. D(t) is 
the difference between the moving average and <j{t). 



both over the word lifetime and throughout the course of his- 
tory. In order to analyze separately these two time frames, we 
select two sets of words: (i) relatively new words with "birth 
year" to.i later than 1800, so that the relative age r = t—toj of 
word i is the number of years after the word's first occurrence 
in the database, and (ii) relatively common words, typically 
withto,; < 1800. 

We analyze dataset (i) words (summary statistics in Table 
SI I so that we can control for properties of the growth dy- 
namics that are related to the various stages of a word's life 
trajectory (e.g. an "infant" phase, an "adolescent" phase, and 
a "mature" phase). For comparison with the young words, we 
also analyze the growth rates of dataset (ii) words in the next 
section (summary statistics in Table |S2| i. These words are pre- 
sumably old enough that they are in a stable mature phase. We 
select dataset (ii) words using the criterion (/,) > fc, where 

ifi) = J2t=i fi{''')l'^i is the average relative use of the word 
i over the word's lifetime Ti = t^j — to^i + 1, and fc is a 



cutoff threshold derived form the Zipf rank-frequency distri- 
bution |1| calculated for each corpus |24|. In Table [S3] we 
summarize the entire data for the 209-year period 1800-2008 
for each of the four Google language sets analyzed. 

Modern words typically are born in relation to technologi- 
cal or cultural events, e.g. "Antibiotics." We ask if there exists 
a characteristic time for a word's general acceptance. In or- 
der to search for patterns in the growth rates as a function of 
relative word age, for each new word i at its age r, we an- 
alyze the "use trajectory" /i(r) and the "growth rate trajec- 
tory" ri{T). So that we may combine the individual trajecto- 
ries of words of varying prevalence, we normalize each fi (t) 
by its average (fi), obtaining a normalized use trajectory 
/j'(r) = fi{T)/{fi)- We perform an analogous normalization 
procedure for each ri{T), normalizing instead by the growth 
rate standard deviation a[ri], so that r-(T) = ri(r)/cr[ri] (see 
the Methods section for further detailed description). 

Since some words will die and other words will increase in 
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FIG. 5: Quantifying tlie tipping point for word use. (a) The max- 
imum in the standard deviation g of growth rates during the "ado- 
lescent" period r ~ 30-50 indicates the characteristic time scale for 
words being incorporated into the standard lexicon, i.e. inclusion in 
popular dictionaries. In Fig. |S4| we plot the average growth rate tra- 
jectory {r {t\Tc)) which shows relatively large positive growth rates 
during approximately the same 20-year period, (b) The first passage 
time T\ |53 1 is defined as the number years for the relative use of a 
new word i to exceed a given /-value for the first time, fiiji) > f. 
For relatively new words with Ti > 100 years we calculate the aver- 
age first-passage time (n (/)) for a large range of /. We estimate for 
each language the fc representing the threshold for a word belonging 
to the standard "kernel" lexicon j4|. This method demonstrates that 
the English corpus threshold /c = 5 x 10~* maps to the first passage 
time corresponding to the peak period r ~ 30 — 50 years in f7{T) 
shown in panel (a). 



use as a result of the standardization of language, we hypoth- 
esize that the average growth rate trajectory will show large 
fluctuations around the time scale for the transition of a word 
into regular use. In order to quantify this transition time scale, 
we create a subset {i \Tc} of word trajectories i by combin- 
ing words that meets an age criteria Ti > Tc- Thus, Tc is 
a threshold to distinguish words that were born in different 
historical eras and which have varying longevity. For the val- 
ues Tf. = 25, 50, 100, and 200 years, we select all words that 
have a lifetime longer than Tc and calculate the average and 



standard deviation for each set of growth rate trajectories as a 
function of word age r. 

In Fig. Blwe plot cr[r^(r|rc)] for the English corpus, which 
shows a broad peak around Tc ~ 30-50 years for each Tc 
subset before the fluctuations saturate after the word enters 
a stable growth phase. A similar peak is observed for each 
corpus analyzed (Figs. S4-S7i. This single-peak growth 



trajectory is consistent with theoretical models for logistic 
spreading and the fixation of words in a population of learners 
ll27l . Also, since we weight the average according to {fi), 
the time scale t^ is likely associated with the characteristic 
time for a new word to reach sufficiently wide acceptance 
that the word is included in a typical dictionary. We note 
that this time scale is close to the generational time scale for 
humans, corroborating evidence that languages require only 
one generation to drastically evolve l;27l . 



Empirical laws quantifying the growth rate distribution. 

How much do the growth rates vary from word to word? 
The answer to this question can help distinguish between 
candidate models for the evolution of word utility. Hence, 
we calculate the probability density function (pdf) of R = 
r^(r)/cr[r'(r|Tc)]. Using this quantity accounts for the fact 
that we are aggregating growth rates of words of varying ages. 
The empirical pdf P{R) shown in Fig. l6]is leptokurtic and re- 
markably symmetric around i? « 0. These empirical facts 
are also observed in studies of the growth rates of economic 
institutions |28-31|. Since the R values are normalized and 
detrended according to the age-dependent standard deviation 
a[r' {t\Tc)], the standard deviation is a{R) = 1 by construc- 
tion. 

A candidate model for the growth rates of word use is 
the Gibrat proportional growth process Il29ll30]| . which pre- 
dicts a Gaussian distribution for P{R)- However, we observe 
the "tent-shaped" pdf P{R) which is well-approximated by a 
Laplace (double-exponential) distribution, defined as 

P{R) ^ ^^1^ cxp[- V2|i? - (i?) \/ct{R)] . (4) 

Here the average growth rate {R) has two properties: (a) 
(i?) « and (b) {R) < (j{R). Property (a) arises from the 
fact that the growth rate of distinct words is quite small on the 
annual basis (the growth rate of books in the Google English 
database is 7^ « 0.011 1241 ') and property (b) arises from the 
fact that R is defined in units of standard deviation. Being 
leptokurtic, the Laplace distribution predicts an excess num- 
ber of events > 3cr as compared to the Gaussian distribution. 
For example, comparing the likelihood of events above the 
Scr event threshold, the Laplace distribution displays a five- 
fold excess in the probability P{\R — {R)\ > id), where 
P{\R - {R)\ > 3cr) = exp[-3v^] « 0.014 for the Laplace 
distribution, whereas P{\R - {R)\ > 3<t) = Erfc[3/V^] w 
0.0027 for the Gaussian distribution. The large R values cor- 
respond to periods of rapid growth and decline in the use of 
words during the crucial "infant" and "adolescent" lifetime 
phases. In Fig. l6lb) we also show that the growth rate dis- 
tribution P{r') for the relatively common words comprising 
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FIG. 6: Common leptokurtic growth distribution for new words 
and common words, (a) Independent of language, the growth rates 
of relatively new words are distributed according to the Laplace dis- 
tribution centered around R ~ defined in Eq. {HI. The the growth 
rate R defined in Eq. (fTTll is measured in units of standard deviation, 
and accounts for age-dependent and word-dependent factors. Yet, 
even with these normalizations, we still observe an excess number of 
\R\ > 3(7 events. This fact is demonstrated by the leptokurtic form 
of each P{R), which exhibit the excess tail frequencies when com- 
pared with a unit-variance Gaussian distribution (dashed blue curve). 
The Gaussian distribution is the predicted distribution for the Gibrat 
proportional growth model, which is a candidate neutral null-model 
for the growth dynamics of word use [29|. The prevalence of large 
growth rates illustrate the possibility that words can have large vari- 
ations in use even over the course of a year. The growth variations 
are intrinsically related to the dynamics of everyday life and reflect 
the cultural and technological shocks in society. We analyze word 
use data over the time period 1800-2008 for new words i with life- 
times Ti > Tc, where we show data calculated for Tc = 100 years. 
(b) PDF P{r') of the annual relative growth rate r' for all words 
which satisfy (fi) > fc (dataset #ii words which are relatively com- 
mon words). In order to select relatively frequently used words, we 
use the following criteria: Ti > 10 years, 1800 < f < 2008, and 
(fi) ^ fc- The growth rate r' does not account for age-dependent 
factors since the common words are likely in the mature phase of 
their lifetime trajectory. In each panel, we plot a Laplace distribution 
with unit variance (solid black lines) and the Gaussian distribution 
with unit variance (dashed blue curve) for reference. 
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FIG. 7: Scaling in the growth rate fluctuations of words. We show 
the dependence of growth rates on the cumulative word frequency 
Si = X]t'=o /»(*) using words satisfy the criteria Ti > 10 years. 
We verify similar results for threshold values Tc = 50, 100, and 200 
years, (a) Average growth rate (r) saturates at relatively constant 
values for large 5*. (b) Scaling in the standard deviation of growth 
rates a{r\S) ~ S^" for words with large 5*. This scaling relation 
is also observed for the growth rates of large economic institutions, 
ranging in size from companies to entire countries |31.i33J. Here this 
size-variance relation corresponds to scaling exponent values 0. 10 < 
/? < 0.21, which are related to the non- trivial bursting patterns and 
non-trivial correlation patterns in literature topicality as indicated by 
the quantitative relation to the Hurst exponent, H — 1 — p shown in 
135J. We calculate l3Eng. ~ 0.16 ± 0.01, I^Eng.fict ~ 0.21 ± 0.01, 
Pspa. ~ 0.10 ± 0.01 and Pscb. ~ 0.17 ± 0.01. 



dataset (ii) is also well-described by the Laplace distribution. 
For hierarchical systems consisting of units each with com- 
plex internal structure 1321 (e.g. a given country consists of in- 
dustries, each of which consists of companies, each of which 
consists of internal subunits), a non-trivial scaling relation be- 
tween the standard deviation of growth rates u{r\S) and the 
system size S has the form 



a{r\S^ 



(5) 



The theoretical prediction in |l32l |331 that /? e [0, 1/2] has 
been verified for several economic systems, with empirical (3 
values typically in the range 0.1 < (3 < 0.3 f33l. 

Since different words have varying lifetime trajectories as 
well as varying relative utilities, we now quantify how the 
standard deviation a{r\Si) of growth rates r depends on the 
cumulative word frequency 



5,^^/,(r), 



(6) 



of each word. We choose this definition for proxy of "word 
size" since a writer can learn and recall a given word through 
any of its historical uses. Hence, Si is also proportional to the 
number of books in which word i appears. This is significantly 
different than the assumptions of replication null models (e.g. 
the Moran process) which use the concurrent frequency fi (t) 
as the sole factor determining the likelihood of future replica- 
tion 1,10., 18J. 

We estimate Eq. (J5]l by grouping words according to 
Si and then calculating the growth rate standard deviation 
a{r\Si) for each group. Fig. Ivlb) shows scaling behavior 
consistent with Eq. |5]for large Si, with (3 w 0.10 - 0.21 
depending on the corpus. A positive /3 value means that 
words with larger cumulative word frequency have smaller 
annual growth rate fluctuations. We conjecture that this 
statistical pattern emerges from the hierarchical organization 
of written language lfl"2Hl6l and the social properties of the 
speakers who use the words |l8][T2l[34l. As such, we calculate 
/3 values that are consistent with nontrivial correlations in 
word use, likely related to the basic fact that books are topi- 
cal |(3j and that book topics are correlated with cultural trends. 



Quantifying the long-term cultural memory. Recent the- 
oretical work t35il shows that there is a fundamental relation 
between the size-variance exponent /3 and the Hurst exponent 
H quantifying the auto-correlations in a stochastic time series. 
The novel relation H ~ 1 — /3 indicates that the temporal long- 
term persistence is intrinsically related to the capability of the 
underlying mechanism to absorb stochastic shocks. Hence, 
positive correlations (H > 1/2) are predicted for non-trivial 
13 values (i.e. < ^5 < 0.5). Note that the Gibrat propor- 
tional growth model predicts /3 = and that a Yule-Simon 
urn model predicts /3 = 0.5 ll33l . Thus, /^(t) belonging to 
words with large Si are predicted to show significant positive 
correlations. Hi > 1/2. 

To test this connection between memory correlations and 
the size-variance scaling, we calculate the Hurst exponent Hi 
for each time series belonging to the more relatively com- 
mon words analyzed in dataset (ii) using detrended fluctuation 
analysis (DFA) Il35ti37l . We plot in Fig. |S2]the relative use 
time series fi{t) for the words "polyphony," "Americanism," 
"Repatriation," and "Antibiotics" along with DFA curves from 
which we calculate each Hi. Fig. S2 b) shows that the Hi 



values for these four words are all significantly greater than 
H,. = 0.5, which is the expected Hurst exponent for a stochas- 
tic time series with no temporal correlations. In Fig. 



S3 



plot the distribution of Hi values for the EngUsh fiction cor- 
pus and the Spanish corpus. Our results are consistent with 
the theoretical prediction (H) — 1 — f3 established in 1 35 1 re- 
lating the variance of growth rates to the underlying temporal 
correlations in each fi{t). Hence, we show that the language 
evolution is fundamentally related to the complex features of 
cultural memory, i.e. the dynamics of cultural topic formation 
lUllllSllIllIll and bursting [38,^. 



Discussion 



With the digitization of written language, cultural trend 
analysis based around methods to extract quantitative patterns 
from word counts is an emerging interdisciplinary field that 
has the potential to provide novel insights into human sociol- 
ogy d El ES] |26l |M1 SQI . Nevertheless, the amount of meta- 
data extractable from daily internet feeds is dizzying. This is 
highlighted by the practical issue of defining objective signif- 
icance levels to filter out the noise in the data deluge. For ex- 
ample, online blogs can be vaguely categorized according to 
the coarse hierarchical schema: "obscure blogs", "more popu- 
lar blogs", "tech columns", and "mainstream news coverage." 
In contrast, there are well-defined entry requirements for pub- 
lished books and magazines, which must meet editorial stan- 
dards and conform to the principles of market supply and de- 
mand. However, until recently, the vast information captured 
in the annals of written language was largely inaccessible. 

Despite the careful guard of libraries around the world, 
which house the written corpora for almost every written 
language, little is known about the aggregate dynamics of 
word evolution in written history. Inspired by research on 
the growth patterns displayed by a wide range of competi- 
tion driven systems - from countries and business firms ||28l - 
l33ll4TH44ll to religious activities |45|, universities [46l, scien- 
tific journals 1471 . careers 1481 and bird populations 1491 - here 
we extend the concepts and methods to word use dynamics. 

This study provides empirical evidence that words are com- 
peting actors in a system of finite resources. Just as busi- 
ness firms compete for market share, words demonstrate the 
same growth statistics because they are competing for the use 
of the writer/speaker and for the attention of the correspond- 
ing reader/listener ifTSljSTllZTl . A prime example of fitness- 
mediated evolutionary competition is the case of irregular and 
regular verb use in English. By analyzing the regularization 
rate of irregular verbs through the history of the English lan- 
guage, Lieberman et al. |50| show that the irregular verbs that 
are used more frequently are less likely to be overcome by 
their regular verb counterparts. Specifically, they find that the 
irregular verb death rate scales as the inverse square root of the 
word's relative use. A study of word diffusion across Indo- 
European languages shows similar frequency-dependence of 
word replacement rates |51 1. 

We document the case example of X-ray, which shows 
how categorically related words can compete in a zero-sum 
game. Moreover, this competition does not occur in a vac- 
uum. Instead, the dynamics are significantly related to dif- 
fusion and technology. Lexical diffusion occurs at many 
scales, both within relatively small groups and across na- 
tions ll27l[34ll5TI . The technological forces underlying word 
selection have changed significantly over the last 20 years. 
With the advent of automatic spell-checkers in the digital era, 
words recognized by spell-checkers receive a significant boost 
in their "reproductive fitness" at the expense of their "mis- 
spelled" or unstandardized counterparts. 

We find that the dynamics are influenced by historical con- 
text, trends in global communication, and the means for stan- 
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dardizing that communication. Analogous to recessions and 
booms in a global economy, the marketplace for words waxes 
and wanes with a global pulse as historical events unfold. And 
in analogy to financial regulations meant to limit risk and mar- 
ket domination, standardization technologies such as the dic- 
tionary and spell checkers serve as powerful arbiters in deter- 
mining the characteristic properties of word evolution. Con- 
text matters, and so we anticipate that niches |34| in various 
language ecosystems (ranging from spoken word to profes- 
sionally published documents to various online forms such as 
chats, tweets and blogs) have heterogenous selection laws that 
may favor a given word in one arena but not another. More- 
over, the birth and death rate of words and their close asso- 
ciates (misspellings, synonyms, abbreviations) depend on fac- 
tors endogenous to the language domain such as correlations 
in word use to other partner words and polysemous contexts 
llT2l 1131 as well as exogenous socio-technological factors and 
demographic aspects of the writers, such as age fT3l and so- 
cial niche ||34J . 

We find a pronounced peak in the fluctuations of word 
growth rates when a word has reached approximately 30-50 
years of age (see Fig. B}. We posit that this corresponds to 
the timescale for a word to be accepted into a standardized 
dictionary which inducts words that are used above a thresh- 
old frequency, consistent with the first-passage times to fc in 
Fig. ISjb). This is further corroborated by the characteristic 
baseline frequencies associated with standardized dictionar- 
ies ifTTl . Another important timescale in evolutionary sys- 
tems is the reproduction age of the interacting gene or meme 
host. Interestingly, a 30-50 year timescale is roughly equal to 
the characteristic human generational time scale. The promi- 
nent role of new generation of speakers in language evolu- 
tion has precedent in linguistics. For example, it has been 
shown that primitive pidgin languages, which are little more 
than crude mixes of parent languages, spontaneously acquire 
the full range of complex syntax and grammar once they are 
learned by the children of a community as a native language. 
It is at this point a pidgin becomes a Creole, in a process re- 
ferred to as nativization |22|. 

Nativization also had a prominent effect in the revival of 
the Hebrew language, a significant historical event which also 
manifests prominently in our statistical analysis. The birth 
rate of new words in the Hebrew language jumped by a factor 
of 5 in just a few short years around 1920 following the Bal- 
four Declaration of 1917 and the Second Aliyah immigration 
to Israel. The combination of new Hebrew-speaking commu- 
nities and political endorsement of a national homeland for the 
Jewish people in the Palestine Mandate had two resounding 
affects: (i) the Hebrew language, hitherto used largely only for 
(religious) writing, gained official status as a modem spoken 
language, and (ii) a centralized culture emerged from this na- 
tional community. The unique history of the Hebrew language 
in concert with the Google Inc. books data thus provide an un- 
precedented opportunity to quantitatively study the emerging 
dynamics of what is, in some regards, a new language. 

The impact of historical context on language dynamics is 
not limited to emerging languages, but extends to languages 
that have been active and evolving continuously for a thousand 



years. We find that historical episodes can drastically perturb 
the properties of existing languages over large time scales. 
Moreover, recent studies show evidence for short-timescale 
cascading behavior in blog trends [1 251 126]| . analogous to the 
aftershocks following earthquakes and the cascades of mar- 
ket volatility following financial news announcements ll52l . 
The nontrivial autocorrelations and the leptokurtic growth dis- 
tributions demonstrate the significance of exogenous shocks 
which can result in growth rates that significantly exceeding 
the frequencies that one would expect from non-interacting 
proportional growth models |29, 30|. 

A large number of the world's ethnic groups are separated 
along linguistic lines. A language barrier can isolate its speak- 
ers by serving as a screen to external events, which may fur- 
ther slow the rate of language evolution by stalling endoge- 
nous change. Nevertheless, we find that the distribution of 
word growth rates significantly broadens during times of large 
scale conflict, revealed through the sudden increases in a{t) 
for the English, French, German and Russian corpora during 
World War II [24|. This can be understood as manifesting 
from the unification of public consciousness that creates fer- 
tile breeding ground for new topics and ideas. During war, 
people may be more likely to have their attention drawn to 
global issues. Remarkably, the pronounced change during 
WWII was not observed for the Spanish corpus, document- 
ing the relatively small roles that Spain and Latin American 
countries played in the war 



Methods 



Quantifying the word use trajectory. Once a word is intro- 
duced into a language, what are the characteristic growth pat- 
terns? To address this question, we first account for important 
variations in words, as the growth dynamics may depend on 
the frequency of the word as well as social and technological 
aspects of the time-period during which the word was born. 

Here we define the age or trajectory year t = t — io,i as 
the number of years after the word's first appearance in the 
database. In order to compare trajectories across time and 
across varying word frequency, we normalize the trajectories 
for each word i by the average use 



1 



^f.' 



m ^Y 1^ Mt) (7) 

* t = to,i 

over the lifetime Ti = tj^i — to^i + 1 of the word, leading to 
the normalized trajectory. 



/;(t) = /;(t - t.,o|i.,o, T,) = /.(t - t,,o)/(/,;) 



(8) 



By analogy, in order to compare various growth trajectories, 
we normalize the relative growth rate trajectory r,'(i) by the 
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standard deviation over the entire lifetime. 



\ 



''f.^ 



t=to., 



[n{t)-{nW 



Hence, the normahzed relative growth trajectory is 



(9) 



(10) 



Figs. S4]|S7 show the weighted averages (/'(t|Tc)) and 



{r' {t\TcJ} and the weighted standard deviations a[f' {t\Tc)] 
and (T[r'(r|rc)] calculated using normalized trajectories for 
new words in each corpus. We compute (• • • ) and cr[- • • ] for 



each trajectory year r using all Nt trajectories (Table SI i that 
satisfy the criteria Ti > Tc and i^ o ^ 1800. We compute the 
weighted average and the weighted standard deviation using 
(fi) as the weight value for word i, so that (• • • ) and cr[- ••] re- 
flect the lifetime trajectories of the more common words that 
are "new" to each corpus. 

Since there is an intrinsic word maturity <j[r'{T\Tc)] that is 
not accounted for in the quantity r'^ir), we further define the 
detrended relative growth 



i?^rKr)/a[/(r|Te)] 



(11) 



which allows us to compare the growth factors for new words 
at various life stages. The result of this normalization is to 
rescale the standard deviations for a given trajectory year r to 
unity for all values of r^{T). 

Detrended fluctuation analysis of individual fi{t). Here we 

outline the DFA method for quantifying temporal autocorre- 
lations in a general time series fi (t) that may have underlying 
trends, and compare the output with the results expected from 
a time series corresponding to a 1 -dimensional random walk. 
In a time interval St, a time series Y{t) deviates from the 
previous value Y{t — 6t) by an amount 6Y{t) = Y{t) — 
Y{t — 5t). A powerful result of the central limit theorem, 
equivalent to Pick's law of diffusion in 1 dimension, is that if 
the displacements are independent (uncorrelated correspond- 
ing to a simple Markov process), then the total displacement 
AY{t) = Y{t) ~ y(0) from the initial location r(0) = 
scales according to the total time t as 



Ar(t) = Y{t) - i^/^ 



However, if there are long-term correlations in the time series 
Y{t\ then the relation is generalized to 



Ay(t) - t^ 



(13) 



where H is the Hurst exponent which corresponds to positive 
correlations for H > 1/2 and negative correlations for H < 

1/2. 

Since there may be underlying social, political, and tech- 
nological trends that influence each time series fi{t), we use 
the detrended fluctuation analysis (DFA) method f35'-'37l to 
analyze the residual fluctuations A/j (t) after we remove the 
local trends. The method detrends the time series using time 
windows of varying length At. The time series fi{t\At) cor- 
responds to the locally detrended time series using window 
size At. We calculate the Hurst exponent H using the rela- 
tion between the root-mean-square displacement F(At) and 
the window size Ai ||35ti37l . 



F{At) = A/(A/,(t|At)2) = At" 



(14) 



Here Afi{t\At) is the local deviation from the average trend, 
analogous to AY{t) defined above. 



Fig. S2 shows 4 different fi (t) in panel (a), and plots the 



(12) 



corresponding Fi{At) in panel (b). The calculated Hi values 
for these 4 words are all significantly greater than the uncor- 
related H = 0.5 value, indicating strong positive long-term 
correlations in the use of these words, even after we have re- 
moved the local trends using DFA. In these example cases, the 
trends are related to political events such as war in the cases 
of "Americanism" and "Repatriation", or the bursting associ- 
ated with new technology in the case of "Antibiotics," or new 
musical trends illustrated in the case of "polyphony." 

In Fig. |S3] we plot the pdf of Hi values calculated for the 
relatively common words analyzed in Fig. l6|b). We also plot 
the pdf of Hi values calculated from shuffled time series, and 
these values are centered around (H) « 0.5 as expected from 
the removal of the intrinsic temporal ordering. Thus, using 
this method, we are able to quantify the social memory char- 
acterized by the Hurst exponent which is related to the burst- 
ing properties of linguistic trends, and in general, to bursting 
phenomena in human dynamics 
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FIG. S 1 : The birth and death rates of a word depends on the relative use of the word. For the English corpus, we calculate the birth and 

death rates for words with median lifetime relative use Med(/i) satisfying Med(/i) > fc- The difference in the birth rate curves corresponds 
to the contribution to the birth rate of words in between the two fc thresholds, and so the small difference in the curves for small fc indicates 
that the birth rate is largely comprised of words with relatively large Med(/i ) . Consistent with this finding, the largest contribution to the death 
rate is from words with relatively low Med(/i). By visually inspecting the lists of dying words, we confirm that words with large relative use 
rarely become completely extinct (see Fig. [Tlfor a counterexample word "Roentgenogram" which was once a frequently used word, but has 
since been eliminated due to competitive forces with other high-fitness competitors). 
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FIG. S2: Measuring the social memory effect using the trajectories of single words. We measure the Hurst exponent for individual fi (t) 
using the detrended fluctuation analysis method I35H37I . (a) Four example fi(t), given in units of the average use (fi), show bursting of 
use as a result of social and political "shock" events. We choose these four examples based on their relatively large Hi > 0.5 values. The 
use of "polyphony" in the English corpus shows peaks during the eras of jazz and rock and roll. The use of "Americanism" shows bursting 
during times of war, and the use of "Repatriation" shows an approximate 10-year lag in the bursting after WWII and the Vietnam War. The 
use of the word "Antibiotics" is related to technological advancement. The top 3 curves are vertically displaced by a constant from the value 
/,;(1800) ~ so that the curves can be distinguished, (b) We use detrended fluctuation analysis (DFA) to calculate the Hurst exponent Hi for 
each word in order to quantify the long-term correlations ("memory") in each fi (t) time series. Fig. |S3| shows the probability density function 
P{H) of Hi values calculated for the relatively common words found in English fiction and Spanish, summarized in Table S2 
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FIG. S3: Individual Hurst exponents Hi Indicate a strong positively correlated memory underlying word use dynamics. Results of 
detrended fluctuation analysis (DFA) I35I437I on the common [dataset (ii)] words analyzed in Fig. [6fb) sliow strong long-term memory witli 
positive correlations, since H > 1/2, indicating strong correlated bursting in tiie dynamics of word use, likely compounded by historical, 
social, or technological events. We calculate {Hi) ± cr = 0.77 ± 0.23 (Eng. fiction) and {Hi) = 0.90 ± 0.29 (Spanish). The size-variance /3 
values calculated from the data in Fig. plconfirm the theoretical prediction {H) = 1 — /3 in |35|. Fig. plshows that Psng.fict ~ 0.21 ± 0.01 
and Pspa. ~ 0.10 ± 0.01. For the shuffled time series, we calculate {Hi) ±a = 0.55 ± 0.07 (Eng. fiction) and {H-) ± a = 0.55 ± 0.08 
(Spanish), which are consistent with time series that lack temporal ordering (memory). 
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FIG. S4: Statistical patterns in the growth trajectories of new words in the English corpus. Characteristics of the time-dependent word 
trajectory show the time scales over which a typical word becomes relevant or fades. For 4 values of Tc, we show the word trajectories for 
dataset (i) words in the English corpus, although the same qualitative results hold for the other languages analyzed. Recall that Tc refers 
to the subset of timeseries with lifetime Ti > Tc, so that two trajectories calculated using different thresholds Tc and Tc only vary for 
T < Max[Tc ,Tc ]■ We show weighted average and standard deviations, using (fi) as the weight for word i contributing to the calculation 
of each time series in year r. (a) The relative use increases with time, consistent with the definition of the weighted average which biases 
towards words with large (fi). For words with large T, the trajectory has a minimum which begins to reverse around r « 40 years, possibly 
reflecting the amount of time it takes to reach a critical utility threshold that corresponds to a relatively high fitness value for the word in 
relation to its competitors, (b) The variations in (/(tJTc)) decrease with time reflecting the transition from the insecure "infant" phase to the 
more secure "adult" phase in the lifetime trajectory, (c) The average growth trajectory is qualitatively related to the logarithmic derivative of 
the curve in panel (a), and confirms that the region of largest positive growth is r ~ 30-50 years, (d) The variations in the average trajectory 
are larger than 1.25 a for 30 < r < 50 years and are larger than 1.0 (t for 10 < r < 80 years. This regime of large fluctuations in the growth 
rates conceivably corresponds to the time period over which a successful word is accepted into the standard lexicon, e.g. a word included in 
an official dictionary or an idea/event recorded in an encyclopedia or review. 
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FIG. S5: Statistical patterns in the growth trajectories of new words In the English Fiction corpus. 
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FIG. S6: Statistical patterns in the growth trajectories of new words in the Spanish corpus. 
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FIG. S7: Statistical patterns in the growth trajectories of new words in the Hebrew corpus. 



TABLE S 1 : Summary of annual growth 
Corpus, 



trajectory data for varying threshold Tc, and s, 
Annual growth R{t) data 



= 0.2, Fo = 1800 and Yf = 2008. 



(1 -grams) 


Tciyears) 


Nt{words) 


% (of all words) 


Nji{values) 


(i?) 


a[R] 


English 


25 


302,957 


4.1 


31,544,800 


2.4 X 10--'' 


1.00 


English fiction 


25 


99,547 


3.8 


11,725,984 


-3.0 X 10-3 


1.00 


Spanish 


25 


48,473 


2.2 


4,442,073 


1.8 X 10-3 


1.00 


Hebrew 


25 


29,825 


4.6 


2,424,912 


-3.6 X 10-3 


1.00 


EngHsh 


50 


204,969 


2.8 


28,071,528 


-1.7 X 10-3 


1.00 


English fiction 


50 


72,888 


2.8 


10,802,289 


-1.7 X 10-3 


1.00 


Spanish 


50 


33,236 


1.5 


3,892,745 


-9.3 X lO-'^ 


1.00 


Hebrew 


50 


27,918 


4.3 


2,347,839 


-5.2 X 10-3 


1.00 


English 


100 


141,073 


1.9 


23,928,600 


1.0 X 10-4 


1.00 


English fiction 


100 


53,847 


2.1 


9,535,037 


-8.5 X lO-'^ 


1.00 


Spanish 


100 


18,665 


0.84 


2,888,763 


-2.2 X 10-3 


1.00 


Hebrew 


100 


4,333 


0.67 


657,345 


-9.7 X 10-3 


1.00 


English 


200 


46,562 


0.63 


9,536,204 


-3.8 X 10-3 


1.00 


English fiction 


200 


21,322 


0.82 


4,365,194 


-3.5 X 10-3 


1.00 


Spanish 


200 


2,131 


0.10 


435,325 


-3.1 X 10-3 


1.00 


Hebrew 


200 


364 


0.06 


74,493 


-1.4 X 10-2 


1.00 



TABLE S2: Summary of data for the relatively common words that meet the criterion that their average word use (fi) over the entire word 
history is larger than a threshold fc, defined for each corpus. In order to select relatively frequently used words, we use the following three 
criteria: the word lifetime T^ > 10 years, 1800 < t < 2008, and {/,) > fc. 

Corpus Data summary for relatively common words 



(1 -grams) fc 


Nt(words) 


% (of all words) 


Nr' {values) 


(r') 


a[r'] 


English 5 X 10"*^ 


106,732 


1.45 


16,568,726 


1.19 xlO-2 


0.98 


English fiction 1 x IQ-^ 


98,601 


3.77 


15,085,368 


5.64 xlO-3 


0.97 


Spanish 1 x IQ-^ 


2,763 


0.124 


473,302 


9.00x10-3 


0.96 


Hebrew 1 x IQ-"'' 


70 


0.011 


6,395 


3.49 xlO-2 


1.00 



TABLE S3: Summary of Google corpus data. Annual growth rates correspond to data in the 209-year period 1800-2008. 
Corpus Annual use Ui{t) 1-gram data Annual growth r{t) data 



(1 -grams) N^iuses) 


y^ 


Yf 


N^iwords) 


Max[u^{t)] 


Nriyalues) 


(r) 


a[r] 


English 3.60 x 10" 


1520 


2008 


7,380,256 


824,591,289 


310,987,181 


2.21 X 10-2 


0.98 


English fiction 8.91 x 10^" 


1592 


2009 


2,612,490 


271,039,542 


122,304,632 


2.32 X 10-2 


1.03 


Spanish 4.51 x 10^" 


1532 


2008 


2,233,564 


74,053,477 


111,333,992 


7.51 X 10-3 


0.91 


Hebrew 2.85 x 10^ 


1539 


2008 


645,262 


5,587,042 


32,387,825 


9.11 X 10-3 


0.90 



